🏗️ AI Infrastructure - nmarshall · Scour

High-throughput, low-cost inference

ionrouter.io·16h·

Discuss: Hacker News

Quantized Inference for OneRec-V2

arxiv.org·7h

Leaderboard of Leaderboards – A Real-Time Meta-Ranking of AI Benchmarks

huggingface.co·8h·

Discuss: Hacker News

🏠Self-hosted AI

Executing programs inside transformers with exponentially faster inference

percepta.ai·1h·

Discuss: r/LocalLLaMA

🤖AI Inference

Build Resilient LLM Applications on Vertex AI and Reduce 429 Errors

cloud.google.com·19h

🤖AI Inference

AI on a Budget: Recompiling Llama.cpp for Qwen3.5 Inference on an HP Z440

jeanbaptistefleury.neocities.org·3d·

Discuss: Hacker News

Security in Data Centers for AI Applications

semiengineering.com·2d

🏠Self-hosted AI

QORA-LLM-2B – Pure Rust ternary inference, no multiplication needed

huggingface.co·1d·

Discuss: Hacker News

☁️Serverless Rust

Nvidia launches Nemotron 3 Super to power enterprise AI agents

infoworld.com·1d

⚡Hardware Acceleration

Ashfaqbs/TinyLLM-usecases: a collection of tiny llms with usecases

github.com·5h·

Discuss: r/LLM, r/LocalLLM

Low-Latency Inference with Speculative Decoding on D-Matrix Corsair and GPU

gimletlabs.ai·2d·

Discuss: Hacker News

⚡Hardware Acceleration

AIs will be used in “unhinged” configurations

lesswrong.com·1d

🌪️Chaos Engineering

Databricks buys Quotient AI to boost enterprise‑grade AI agent performance

infoworld.com·1d

mathbook.cafe·9h

🤖AI Coding Tools

State of AI 2026: The $600B inference subsidy, energy bottlenecks, and labor

lostframe.ai·2d·

Discuss: Hacker News

🤖AI Coding Tools

Top AI GitHub Repositories in 2026

blog.bytebytego.com·3d

🤖AI Coding Tools

AI Power on the Edge

semiengineering.com·1d

🧠Neuromorphic Chips

Build an AI Code Review Bot with Semantic Kernel in C#

devleader.ca·14h

🤖AI Coding Tools

Calling all who run inference in models

news.ycombinator.com·3d·

Discuss: Hacker News

🤖AI Inference

A Plan ‘B’ for AI safety

lesswrong.com·15h

🤖Anthropic Claude

Loading more...